10 research outputs found
Achieving Synergy in Cognitive Behavior of Humanoids via Deep Learning of Dynamic Visuo-Motor-Attentional Coordination
The current study examines how adequate coordination among different
cognitive processes including visual recognition, attention switching, action
preparation and generation can be developed via learning of robots by
introducing a novel model, the Visuo-Motor Deep Dynamic Neural Network (VMDNN).
The proposed model is built on coupling of a dynamic vision network, a motor
generation network, and a higher level network allocated on top of these two.
The simulation experiments using the iCub simulator were conducted for
cognitive tasks including visual object manipulation responding to human
gestures. The results showed that synergetic coordination can be developed via
iterative learning through the whole network when spatio-temporal hierarchy and
temporal one can be self-organized in the visual pathway and in the motor
pathway, respectively, such that the higher level can manipulate them with
abstraction.Comment: submitted to 2015 IEEE-RAS International Conference on Humanoid
Robot
Agreement Study Using Gesture Description Analysis
Choosing adequate gestures for touchless interfaces is a challenging task that has a direct impact on human-computer interaction. Such gestures are commonly determined by the designer, ad-hoc, rule-based or agreement-based methods. Previous approaches to assess agreement grouped the gestures into equivalence classes and ignored the integral properties that are shared between them. In this work, we propose a generalized framework that inherently incorporates the gesture descriptors into the agreement analysis (GDA). In contrast to previous approaches, we represent gestures using binary description vectors and allow them to be partially similar. In this context, we introduce a new metric referred to as Soft Agreement Rate (SAR) to measure the level of agreement and provide a mathematical justification for this metric. Further, we performed computational experiments to study the behavior of SAR and demonstrate that existing agreement metrics are a special case of our approach. Our method was evaluated and tested through a guessability study conducted with a group of neurosurgeons. Nevertheless, our formulation can be applied to any other user-elicitation study. Results show that the level of agreement obtained by SAR is 2.64 times higher than the previous metrics. Finally, we show that our approach complements the existing agreement techniques by generating an artificial lexicon based on the most agreed properties
Fazt: Few and Zero-Shot Framework to Learn Tempo-Visual Events from Little or no Data
Supervised classification methods based on deep learning have achieved great success in many domains and tasks that are previously unimaginable. Such approaches build on learning paradigms that require hundreds of examples in order to learn to classify objects or events. Thus, their immediate application to the domains with few or no observations is limited. This is because of the lack of ability to rapidly generalize to new categories from a few examples or from high-level descriptions of categories. This can be attributed to the significant gap between the way machines represent knowledge and the way humans represent categories in their minds and learn to recognize them. In this context, this research represents categories as semantic trees in a high-level attribute space and proposes an approach to utilize these representations to conduct N-Shot, Few-Shot, One-Shot, and Zero-Shot Learning (ZSL). This work refers to this paradigm as the problem of general classification (GCP) and proposes a unified framework for GCP referred to as the Few and Zero-Shot Technique (FAZT). FAZT framework is an end-to-endapproach that uses trainable 3D convolutional neural networks and recurrent neural networks to simultaneously optimize for both the semantic and the classification tasks. Lastly, the problem of systematically obtaining semantic attributes by utilizing domain-specific ontologies is presented. The proposed framework is validated in the domains of hand gesture and action/activity recognition, however, this research can be applied to other domains such as video understanding, the study of human behavior, emotion recognition, etc. First, an attribute-based dataset for gestures is developed in a systematic manner by relying on literature in gestures and semantics, and crowdsourced platforms such as Amazon Mechanical Turk. To the best of our knowledge, this is the first ZSL dataset for hand gestures (ZSGL dataset). Next, our framework is evaluated in two experimental conditions: 1. Within-category (to test the attribute recognition power) and 2.Across-category (to test the ability to recognize an unknown category). In addition, we conducted experiments in zero-shot, one-shot, few-shot and continuous learning conditions in both open-set and closed-setscenarios. Results showed that our framework performs favorably on the ZSGL, Kinetics, UIUC Action, UCF101 and HMDB51 action datasets in all the experimental conditions
The complete set of 55 semantic descriptors, grouped by category.
<p>The complete set of 55 semantic descriptors, grouped by category.</p
Consensus measured by the Metric I (<i>State of the art</i>) and Metric II (<i>The Jaccard distance using semantic descriptors</i>).
<p>Consensus measured by the Metric I (<i>State of the art</i>) and Metric II (<i>The Jaccard distance using semantic descriptors</i>).</p
This form contains a list of 34 commands.
<p>Each command is highlighted in gray. The rectangle at the left of the command corresponds to the context of the gesture and the 2-4 rectangles to the right correspond to the modifiers.</p